Search CORE

101 research outputs found

EcID. A database for the inference of functional interactions in E. coli

Author: A. Valencia
Altschul
B. Garcia
Bowers
Chenna
D. Juan
Dandekar
E. Andres Leon
Edgar
Gaasterland
Goh
Hermjakob
Hoffmann
I. Ezkurdia
Ibanez-Ruiz
Kersey
Keseler
Marcotte
Nitschk
Pazos
Pazos
Pellegrini
Peterson
Shannon
Tamames
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

The EcID database (Escherichia coli Interaction Database) provides a framework for the integration of information on functional interactions extracted from the following sources: EcoCyc (metabolic pathways, protein complexes and regulatory information), KEGG (metabolic pathways), MINT and IntAct (protein interactions). It also includes information on protein complexes from the two E. coli high-throughput pull-down experiments and potential interactions extracted from the literature using the web services associated to the iHOP text-mining system. Additionally, EcID incorporates results of various prediction methods, including two protein interaction prediction methods based on genomic information (Phylogenetic Profiles and Gene Neighbourhoods) and three methods based on the analysis of co-evolution (Mirror Tree, In Silico 2 Hybrid and Context Mirror). EcID associates to each prediction a specifically developed confidence score. The two main features that make EcID different from other systems are the combination of co-evolution-based predictions with the experimental data, and the introduction of E. coli-specific information, such as gene regulation information from EcoCyc. The possibilities offered by the combination of the EcID database information are illustrated with a prediction of potential functions for a group of poorly characterized genes related to yeaG. EcID is available online at http://ecid.bioinfo.cnio.es

Crossref

PubMed Central

Universidad Carlos III de Madrid e-Archivo

EcID. A database for the inference of functional interactions in E. coli

Author: A. Valencia
Altschul
B. Garcia
Bowers
Chenna
D. Juan
Dandekar
E. Andres Leon
Edgar
Gaasterland
Goh
Hermjakob
Hoffmann
I. Ezkurdia
Ibanez-Ruiz
Kersey
Keseler
Marcotte
Nitschk
Pazos
Pazos
Pellegrini
Peterson
Shannon
Tamames
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Crossref

PubMed Central

Universidad Carlos III de Madrid e-Archivo

The efficacy of various machine learning models for multi-class classification of RNA-seq expression data

Author: A Statnikov
AT Azar
G Bartsch
G Sanz
H Rhee
I Ezkurdia
J Friedman
J Meng
J Zhi
JN Weinstein
L Breiman
M Al-Rajab
M Villamizar
MD Podolsky
MS Lawrence
ND Khalilabad
P Geurts
R Díaz-Uriarte
S Bram Ednersson
S Tarek
T Cover
X Li
Y Perez-Riverol
Y Shang
Y Tan
Z Ye
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/08/2019
Field of study

Late diagnosis and high costs are key factors that negatively impact the care of cancer patients worldwide. Although the availability of biological markers for the diagnosis of cancer type is increasing, costs and reliability of tests currently present a barrier to the adoption of their routine use. There is a pressing need for accurate methods that enable early diagnosis and cover a broad range of cancers. The use of machine learning and RNA-seq expression analysis has shown promise in the classification of cancer type. However, research is inconclusive about which type of machine learning models are optimal. The suitability of five algorithms were assessed for the classification of 17 different cancer types. Each algorithm was fine-tuned and trained on the full array of 18,015 genes per sample, for 4,221 samples (75 % of the dataset). They were then tested with 1,408 samples (25 % of the dataset) for which cancer types were withheld to determine the accuracy of prediction. The results show that ensemble algorithms achieve 100% accuracy in the classification of 14 out of 17 types of cancer. The clustering and classification models, while faster than the ensembles, performed poorly due to the high level of noise in the dataset. When the features were reduced to a list of 20 genes, the ensemble algorithms maintained an accuracy above 95% as opposed to the clustering and classification models.Comment: 12 pages, 4 figures, 3 tables, conference paper: Computing Conference 2019, published at https://link.springer.com/chapter/10.1007/978-3-030-22871-2_6

arXiv.org e-Print Archive

Crossref

Prediction of protein binding sites in protein structures using hidden Markov support vector machine

Author: A Henschel
A Koike
A Kouranov
A Porollo
A Rossi
AJ Bordner
B Wang
Bin Liu
Buzhou Tang
C Chothia
C Yan
C Yan
C-T Chen
C-W Cheng
H Chen
H Kim
H Neuvirth
H-X Zhou
HX Zhou
I Ezkurdia
I Res
I Tsochantaridis
I Tsochantaridis
J Lafferty
J Song
J Song
J-L Chung
JD Fischer
JL Chung
JR Bradford
JW Torrance
K Henrick
L Holm
L Lo Conte
L Wang
Lei Lin
LR Rabiner
M Gribskov
M Vincent
M Šikić
MH Li
N Li
NJ Burgoyne
P Fariselli
Q Dong
Qiwen Dong
S Ahmad
S Liang
S Qin
SF Altschul
SF Altschul
T Joachims
T Zhang
TH Dang
W Kabsch
WK Kim
X-w Chen
Xiaolong Wang
Xuan Wang
Y Altun
Y Liu
Y Ofran
Y Ofran
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance. Results In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods. Conclusion The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster

Author: A Singhania
A Sokolov
AE Lobley
B Marita
BR Graveley
Cen Wan
CF Wu
Christine A. Orengo
D Barrell
D Cozzetto
D Cozzetto
D Cozzetto
D Fristrom
D Sutherland
David T. Jones
DJ Montell
F Minneci
Federico Minneci
I Ezkurdia
J Friedman
JB Weiss
JC Costello
JG Lees
Jonathan G. Lees
JW Truman
L Breiman
L Lan
M Ashburner
M Friedrich
NK Cho
P Radivojac
P Tomancak
R Cagan
S Hunter
S Roy
SD Hooper
T Bossing
T Chang
T Cover
T Kojima
TR Li
VR Chintapalli
Yanay Ofran
YX Jiang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/10/2017
Field of study

Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster

Crossref

Directory of Open Access Journals

UCL Discovery

Birkbeck Institutional Research Online

Protein docking prediction using predicted protein-protein interface

Author: A Berchanski
A Porollo
A Szilagyi
A Tovchigrechko
AA Bogan
AM Bonvin
B Huang
B Pierce
Bin Li
C Dominguez
C Zhang
CL Hutchinson
CL Lo
D Eisenberg
D Fischer
D Kozakov
D Kozakov
D La
D Schneidman-Duhovny
Daisuke Kihara
DR Caffrey
DW Ritchie
DW Ritchie
E Karaca
EJ Gardiner
EJ Gardiner
EV Pletneva
F Jiang
F Pazos
F Pazos
FK Pettit
GS Anand
H Hwang
H Neuvirth
H Tjong
H Wolfson
HA Gabb
HM Berman
HX Zhou
HX Zhou
I Andre
I Ezkurdia
I Halperin
I Halperin
I Kufareva
I Mihalek
I Res
J Esquivel-Rodriguez
J Esquivel-Rodriguez
J Janin
J Mintseris
JI Garzon
JJ Gray
JR Bradford
K Henrick
K Wiehe
L Giot
M Meyer
M Tress
MF Lensink
MH Li
N Andrusier
NA Meenan
NJ Burgoyne
O Schueler-Furman
P Aloy
P Heuser
P Uetz
R Chen
R Das
R Mendez
RB Russell
RC Edgar
RD Finn
RD Finn
RT Bradshaw
S Dhungana
S Jones
S Jones
S Liang
S Liang
S Qin
SH Speck
SJ de Vries
SJ de Vries
SR Comeau
SR Comeau
SS Negi
SY Huang
T Ito
T Lazaridis
Uniprot Consortium
V Chelliah
V Collura
V Venkatraman
W Kabsch
W Tong
WL Delano
X Li
Y Inbar
Y Shen
Z Shentu
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Many important cellular processes are carried out by protein complexes. To provide physical pictures of interacting proteins, many computational protein-protein prediction methods have been developed in the past. However, it is still difficult to identify the correct docking complex structure within top ranks among alternative conformations. Results We present a novel protein docking algorithm that utilizes imperfect protein-protein binding interface prediction for guiding protein docking. Since the accuracy of protein binding site prediction varies depending on cases, the challenge is to develop a method which does not deteriorate but improves docking results by using a binding site prediction which may not be 100% accurate. The algorithm, named PI-LZerD (using Predicted Interface with Local 3D Zernike descriptor-based Docking algorithm), is based on a pair wise protein docking prediction algorithm, LZerD, which we have developed earlier. PI-LZerD starts from performing docking prediction using the provided protein-protein binding interface prediction as constraints, which is followed by the second round of docking with updated docking interface information to further improve docking conformation. Benchmark results on bound and unbound cases show that PI-LZerD consistently improves the docking prediction accuracy as compared with docking without using binding site prediction or using the binding site prediction as post-filtering. Conclusion We have developed PI-LZerD, a pairwise docking algorithm, which uses imperfect protein-protein binding interface prediction to improve docking accuracy. PI-LZerD consistently showed better prediction accuracy over alternative methods in the series of benchmark experiments including docking using actual docking interface site predictions as well as unbound docking cases.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Purdue E-Pubs

Characterization of pathogenic germline mutations in human Protein Kinases

Author: A Baudot
A Fiser
A Hamosh
A Rausell
A Torkamani
A Zankl
Alfonso Valencia
Andrew CR Martin
Anja Baresic
C Ferrer-Costa
C Ferrer-Costa
C Greenman
C Mao
Christine A Orengo
CT Porter
D Vitkup
E Jain
F Pazos
FS Collins
G López
G Manning
G Manning
HM Berman
I Ezkurdia
J Hurst
J Izarzugaza
J Ptacek
JA Ubersax
JDR Knight
JG Leroy
Jose MG Izarzugaza
K Garber
LD Wood
Lisa EM Hopcroft
M Caceres
M Krallinger
MJ Karkkainen
P Taillon-Miller
P Yue
PA Futreal
R Fonseca
RC Edgar
S Forbes
SF Altschul
ST Sherry
VG Krishnan
Z Wang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background Protein Kinases are a superfamily of proteins involved in crucial cellular processes such as cell cycle regulation and signal transduction. Accordingly, they play an important role in cancer biology. To contribute to the study of the relation between kinases and disease we compared pathogenic mutations to neutral mutations as an extension to our previous analysis of cancer somatic mutations. First, we analyzed native and mutant proteins in terms of amino acid composition. Secondly, mutations were characterized according to their potential structural effects and finally, we assessed the location of the different classes of polymorphisms with respect to kinase-relevant positions in terms of subfamily specificity, conservation, accessibility and functional sites. Results Pathogenic Protein Kinase mutations perturb essential aspects of protein function, including disruption of substrate binding and/or effector recognition at family-specific positions. Interestingly these mutations in Protein Kinases display a tendency to avoid structurally relevant positions, what represents a significant difference with respect to the average distribution of pathogenic mutations in other protein families. Conclusions Disease-associated mutations display sound differences with respect to neutral mutations: several amino acids are specific of each mutation type, different structural properties characterize each class and the distribution of pathogenic mutations within the consensus structure of the Protein Kinase domain is substantially different to that for non-pathogenic mutations. This preferential distribution confirms previous observations about the functional and structural distribution of the controversial cancer driver and passenger somatic mutations and their use as a proxy for the study of the involvement of somatic mutations in cancer development.</p&gt

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Full-text Institutional Repository of the Ruđer Bošković Institute

Enlighten

Protein-Protein Interaction Site Predictions with Three-Dimensional Probability Distributions of Interacting Atoms on Protein Surfaces

Author: A Koike
A Porollo
AA Bogan
An-Suei Yang
Attila Gursoy
BJ McConkey
BW Matthews
CC Chang
CD Manning
Ching-Tai Chen
CJC Burges
CM Yu
CT Chen
DE Rumelhart
DT Chang
ED Levy
Ei-Wen Yang
F Glaser
F Rodier
FB Sheinerman
G Moont
H Neuvirth
Hung-Pin Peng
HX Zhou
I Ezkurdia
I Kufareva
I Res
IS Moreira
J Janin
Jeng-Yih Chang
Jhih-Wei Jian
JM Elkins
Jun-Bo Chen
K Henrick
K Levenberg
Keng-Chang Tsai
L Breiman
L Jiang
L Lo Conte
M Reidmiller
M Riedmiller
M Sikic
MH Li
MN Wass
MN Wass
N Tuncbag
O Keskin
P Chakrabarti
PJ Kundrotas
QC Zhang
QC Zhang
RA Laskowski
S Engelen
S Jones
S Sacquin-Mora
Shinn-Ying Ho
SJ de Vries
SJ Hubbard
SS Negi
Wen-Lian Hsu
X Gallet
Y Murakami
Y Murakami
Y Ofran
Y Ofran
Y Ofran
Publication venue: Public Library of Science
Publication date: 06/06/2012
Field of study

Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Roles of residues in the interface of transient protein-protein complexes before complexation

Author: A Goede
AA Bogan
AG Murzin
B Lee
BE Suzek
C Chothia
D Rajamani
D Reichmann
D Reichmann
E Krissinel
ED Levy
GR Smith
H Hwang
H Zhu
HM Berman
I Ezkurdia
IM Nooren
J Janin
J Janin
J Janin
J Mintseris
J Schymkowitz
JA Capra
JD Thompson
JH Lakey
JR Perkins
K Rother
L Holm
L Holm
L Lo Conte
M Vidal
N Rekha
N Tuncbag
O Keskin
O Keskin
O Keskin
ON Yogurtcu
P Chakrabarti
R Guerois
RP Bahadur
S Ansari
S De
S Jones
S Jones
S Miller
S Parthasarathy
S Sonavane
S Vajda
SF Altschul
SJ Hubbard
T Clackson
X Li
YS Choi
Z Yuan
Publication venue: Nature Publishing Group
Publication date: 26/03/2012
Field of study

Transient protein-protein interactions play crucial roles in all facets of cellular physiology. Here, using an analysis on known 3-D structures of transient protein-protein complexes, their corresponding uncomplexed forms and energy calculations we seek to understand the roles of protein-protein interfacial residues in the unbound forms. We show that there are conformationally near invariant and evolutionarily conserved interfacial residues which are rigid and they account for ∼65% of the core interface. Interestingly, some of these residues contribute significantly to the stabilization of the interface structure in the uncomplexed form. Such residues have strong energetic basis to perform dual roles of stabilizing the structure of the uncomplexed form as well as the complex once formed while they maintain their rigid nature throughout. This feature is evolutionarily well conserved at both the structural and sequence levels. We believe this analysis has general bearing in the prediction of interfaces and understanding molecular recognition

Crossref

PubMed Central

Open Access Repository of IISc Research Publications

Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information

Author: A Porollo
AJ Bordner
B Wang
B Wang
BD Alberts
C Cortes
C Sander
ED Levy
F Glaser
F Pazos
H Chen
H Zhou
HM Berman
HS Wong
I Ezkurdia
I Res
J Chung
J Janin
J Kittler
J Kyte
J Mihel
JC Bezdek
Jinyan Li
JR Bradford
JR Bradford
KS Thorn
L Lo Conte
LI Kuncheva
LK Hansen
M Charton
M Guharoy
M Sikic
N H
P Baldi
P Chakrabarti
P Chen
P Cherepanov
P Cherepanov
P Fariselli
Peng Chen
Q Dong
R Singh
RA Laskowski
RD Pascual-Marqui
RM Kini
RP Bahadur
RP Bahadur
S Jones
S Jones
S Jones
SJ de Vries
T Friedrich
T Kohonen
TA Larsen
TJ Bollenbach
Uni-Prot-Consortium
V Chelliah
W Kauzmann
X Du
X Gallet
XW Chen
Y Murakami
Y Ofran
Y Ofran
Y Ofran
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Protein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone. Results We propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods. Conclusions The integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance. Availability Datasets and software are available at <url>http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

OPUS - University of Technology Sydney

PubMed Central